add checkm2 #6542

astrovsky01 · 2024-11-08T16:48:48Z

FOR CONTRIBUTOR:

I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
License permits unrestricted use (educational + commercial)
This PR adds a new tool or tool collection
This PR updates an existing tool or tool collection
This PR does something else (explain below)

tools/checkm2/.shed.yml

tools/checkm2/checkm2.xml

bernt-matthias

Excellent timing: One of my users just asked for the tool :)

Could contribute a data manager.

tools/checkm2/checkm2.xml

remove dbkey column rename tables

…to checkm2

tools/checkm2/checkm2.xml

tools/checkm2/tool-data/checkm2.loc.sample

tools/checkm2/checkm2.xml

…to checkm2

and add working output assertions as comment

bernt-matthias

Good from my side.

bgruening · 2024-11-19T19:10:00Z

data_managers/data_manager_checkm2/tool-data/checkm2.loc.sample

+#The <version> column indicates the checkm2 version that generated the database
+
+#
+#diamond_db_1.0.2	Diamond database	1.0.2	/mnt/galaxyIndices/Checkm2_database/uniref100.KO.1.dmnd


Is this really a diamond DB?
If so, this is interesting ... should we have a general Diamond location file and DM? with some tag for different tools?

I think so. And I agree that it would be interesting.

But it would be good to know and store the diamond version that has been used to generate it, or? Seems difficult to find out from the sources. The tool just downloads the latest version from zenodo (and I could not even find the link). Let me check if diamond dbinfo could help.

Nice:

> diamond dbinfo -d uniref100.KO.1.dmnd diamond v2.0.4.142 (C) Max Planck Society for the Advancement of Science Documentation, support and updates available at http://www.diamondsearch.org Database format version = 3 Diamond build = 142 Sequences = 6518230 Letters = 2584051404

Should we do this? Add columns tool, db_format_version, diamond_build?

Do we need diamond_build? But yes, we should do that :)

Thanks!

@astrovsky01 do you think you can work on this?

After some digging, I found that Checkm2 doesn't actually work with all diamond databases. It has an internal checksum to make sure it's the specific one from the database download command:

https://github.com/chklovski/CheckM2/blob/319dae65f1c7f2fc1c0bb160d90ac3ba64ed9457/checkm2/versionControl.py#L74

as such, I think that while it would be good to have the general Diamond db data manager, having a specific one for checkm2 is also a good idea

Yes, let's go ahead .. I would say.

My feeling is that a general data manager would be too complex and multiple data managers writing to the same data table also seems confusing. Maybe it's better to have tools load multiple data tables?

I was playing around with the writing basically extra labels, but that requires someone on the other end parsing the table. Also, depending on the tool, you start to get additional bloat in the requirements. At the very least, checkm2's tool would require its conda package, and that's just extra dependencies for the other tools, even when you don't need it. I think it's a good idea conceptually, but maybe not for this case, specifically

Can you elaborate? I do not understand what you want to say.

writing basically extra labels

What labels?

Also, depending on the tool, you start to get additional bloat in the requirements.

How? We would just load another datatable - we need to requirements for this.

Oh I just mean that if we start to have tools that share the table but can't use all of them. Just referring to the tool label you mentioned at the top of the thread.
And I'd meant requirements for the data manager itself. In this case, you'd need the checkm2 conda package on top of the diamond package, as opposed to just the diamond package in the diamond_build_db data manager that already exists. If other tools operate similarly to checkm2 in the future, they'd need their conda package added to the data manager's xml

Alexander OSTROVSKY added 6 commits November 8, 2024 11:46

add checkm2

cf9551f

typo

15149f8

add fail state because db can't be on github

bde6c07

fix error codes

d96651c

fix

f3ad7c5

fix space

106d6bf

bgruening reviewed Nov 9, 2024

View reviewed changes

tools/checkm2/.shed.yml Outdated Show resolved Hide resolved

tools/checkm2/checkm2.xml Outdated Show resolved Hide resolved

tools/checkm2/checkm2.xml Outdated Show resolved Hide resolved

tools/checkm2/checkm2.xml Outdated Show resolved Hide resolved

add database

31f692d

astrovsky01 marked this pull request as draft November 13, 2024 00:12

bernt-matthias reviewed Nov 15, 2024

View reviewed changes

Alexander OSTROVSKY and others added 4 commits November 15, 2024 10:27

bernt-matthias comments

857f362

lint and test fix

7862720

data table tweaks

1942fb2

remove dbkey column rename tables

Merge branch 'checkm2' of https://github.com/astrovsky01/tools-iuc in…

abf47ab

…to checkm2

astrovsky01 marked this pull request as ready for review November 15, 2024 20:15

add re import

28e5048

bgruening reviewed Nov 16, 2024

View reviewed changes

tools/checkm2/checkm2.xml Outdated Show resolved Hide resolved

tools/checkm2/tool-data/checkm2.loc.sample Outdated Show resolved Hide resolved

tools/checkm2/checkm2.xml Outdated Show resolved Hide resolved

tools/checkm2/checkm2.xml Outdated Show resolved Hide resolved

bernt-matthias added 10 commits November 18, 2024 22:49

add data manager

5b65ffc

Merge branch 'checkm2' of https://github.com/astrovsky01/tools-iuc in…

a312aa2

…to checkm2

add note on testing

c02e5e7

remove extension from pattern

f562d08

and add working output assertions as comment

use format instead of ext

65a1f3d

quote file names

4dac0c7

more precise output label

a5c4daf

switch columns

95c1cd9

add long description

de6485d

fix URL

d877fe4

bernt-matthias approved these changes Nov 19, 2024

View reviewed changes

bgruening reviewed Nov 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add checkm2 #6542

add checkm2 #6542

astrovsky01 commented Nov 8, 2024

bernt-matthias left a comment

bernt-matthias left a comment

bgruening Nov 19, 2024

bernt-matthias Nov 20, 2024

bernt-matthias Nov 20, 2024

bgruening Nov 20, 2024

bgruening Dec 6, 2024

astrovsky01 Dec 12, 2024

bernt-matthias Dec 13, 2024

astrovsky01 Dec 13, 2024

bernt-matthias Dec 13, 2024

astrovsky01 Dec 13, 2024

add checkm2 #6542

Are you sure you want to change the base?

add checkm2 #6542

Conversation

astrovsky01 commented Nov 8, 2024

bernt-matthias left a comment

Choose a reason for hiding this comment

bernt-matthias left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment